Constant expansion and that energy consumption factors are ignored with its design process, bring the problem of high energy consumption and low efficiency of the cloud storage system. And this problem has become a main bottleneck in the development of cloud computing and big data. Most of previous studies had been mostly used to adjust the entire storage node to the low-power mode to save energy. According to the repetition of data and access rules, new storage model based on data classification was proposed. The storage area was divided into HotZone, ColdZone and ReduplicationZone so as to divisionally store the data according to the repetition and activity factor characteristics of each data file. Based on the new storage model, an energy-efficient storage algorithm was designed and a new storage model was constructed. The experimental results show that, the new storage model improves the energy utilization rate of the distributed storage system nearly 25%, especially when the system load is lower than the given threshold.
Like MapReduce, tasks under big data environment are always with data-dependent constraints. The resource selection strategy in distributed storage system trends to choose the nearest data block to requestor, which ignored the server's resource load state, like CPU, disk I/O and network, etc. On the basis of the distributed storage system's cluster structure, data file division mechanism and data block storage mechanism, this paper defined the cluster-node matrix, CPU load matrix, disk I/O load matrix, network load matrix, file-division-block matrix, data block storage matrix and data block storage matrix of node status. These matrixes modeled the relationship between task and its data constraints. And the article proposed an optimal resource selection algorithm with data-dependent constraints (ORS2DC), in which the task scheduling node is responsible for base data maintenance, MapRedcue tasks and data block read tasks take different selection strategies with different resource-constraints. The experimental results show that, the proposed algorithm can choose higher quality resources for the task, improve the task completion quality while reducing the NameNode's load burden, which can reduce the probability of the single point of failure.
A storing systematic configuration of spatial metadata based on XDR Schema was proposed and a XML data reduced schema was created. The spatial metadata expressed by XML was mapped to SQL Server 2000 RDBMS. The annotated XDR schema corresponded with XML view, so we could query database using annotated XDR schema and get result in XML form.